Iterative Matrix Factorization Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets

نویسندگان

  • Jie Wang
  • Jun Zhang
چکیده

1 Abstract— Powerful modern access to huge amounts of various data having high or low level of privacy brings out a concurrent increasing demand for preserving data privacy. The challenge is how to protect attribute values without jeopardizing the similarity between data objects under analysis. In this paper, we further our previous work on applying matrix decomposition techniques to protect privacy and to present a novel algebraic technique based on iterative methods for non-negative-valued data distortion. As an unsupervised learning method for uncovering latent features in high-dimensional data, a low rank nonnegative matrix factorization (NMF) is used to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. This is the first in privacy preserving data mining to combine non-negative matrix decomposition with data distortion processing. Two iterative methods to solve bound-constrained optimization problem in NMF are compared with experiments on Wisconsin Breast Cancer Dataset. The overall performance of NMF on distortion level and data utility is compared to our previously proposed SVD-based distortion strategies and other existing popular data perturbation methods. Data utility is examined by cross validation of a binary classification using support vector machine. Our experimental results indicate that, in comparison with standard data distortion techniques, the proposed NMF-based method is very efficient in balancing data distortion and data utility, and it affords a feasible solution with a good promise on high-accuracy privacy preserving data mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NNMF-Based Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets

1 Abstract— Powerful modern access to a huge amount of various data having high or low level of privacy brings out a concurrent increasing demand for preserving data privacy. The challenge is how to protect attribute values without jeopardizing the similarity between data objects under analysis. In this paper, we further our previous work on applying matrix techniques to protect privacy and pre...

متن کامل

Iterative Weighted Non-smooth Non-negative Matrix Factorization for Face Recognition

Non-negative Matrix Factorization (NMF) is a part-based image representation method. It comes from the intuitive idea that entire face image can be constructed by combining several parts. In this paper, we propose a framework for face recognition by finding localized, part-based representations, denoted “Iterative weighted non-smooth non-negative matrix factorization” (IWNS-NMF). A new cost fun...

متن کامل

A new approach for building recommender system using non negative matrix factorization method

Nonnegative Matrix Factorization is a new approach to reduce data dimensions. In this method, by applying the nonnegativity of the matrix data, the matrix is ​​decomposed into components that are more interrelated and divide the data into sections where the data in these sections have a specific relationship. In this paper, we use the nonnegative matrix factorization to decompose the user ratin...

متن کامل

Dual Sparseness Constrained Nonnegative Matrix Factorization for Data Privacy and High Accuracy Utility ⋆

In this paper, we propose a data distortion strategy based on dual sparseness constrained Nonnegative Matrix Factorization (NMF). The dual sparseness constrained nonnegative matrix factorization model incorporates attached term constrain and positive symmetric matrix into NMF, which is different from the previous approaches. The goal of our study is data perturbation and we study the distortion...

متن کامل

Computing non-negative tensor factorizations

Nonnegative tensor factorization (NTF) is a technique for computing a parts-based representation of high-dimensional data. NTF excels at exposing latent structures in datasets, and at finding good low-rank approximations to the data. We describe an approach for computing the NTF of a dataset that relies only on iterative linear-algebra techniques and that is comparable in cost to the nonnegativ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006